Dataset Format
CogKit/quickstart/data directory contains various dataset templates for fine-tuning different models, please refer to the corresponding dataset template based on your task type:
Text-to-Image Conversion Dataset (t2i)
-
Each directory contains a set of image files (
.png) -
The
metadata.jsonlfile contains text descriptions for each image{"file_name": "example.png", "prompt": "Detailed image description text..."}
Text-to-Video (t2v)
-
Each directory contains a set of video files (
.mp4) -
The
metadata.jsonlfile contains text descriptions for each video{"file_name": "example.mp4", "prompt": "Detailed video description text..."}
Image-to-Video (i2v)
-
The dataset is organized with the following structure:
train/andtest/directories each containing:videos/directory for video files (.mp4)images/directory for input image files (.png)metadata.jsonlfile in the root containing prompt descriptions
-
The main
metadata.jsonlfile in the root directory contains prompt information for each sample:{"id": 0, "prompt": "Detailed video description text..."}
{"id": 1, "prompt": "Detailed video description text..."} -
The
videos/metadata.jsonlfile maps video files to their corresponding IDs:{"file_name": "example.mp4", "id": 0} -
The
images/metadata.jsonlfile maps image files to their corresponding IDs:{"file_name": "example.png", "id": 0}
- Image and video files are linked by sharing the same ID
- If image files are not provided, the system will default to using the first frame of the corresponding video as the input image
Notes
-
Training sets (
train/) are used for model training, test sets (test/) are used for evaluating model performance -
Each dataset will generate a
.cache/directory during training, used to store preprocessed data. If the dataset changes, you need to manually delete this directory and retrain.